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Abstract 

The ensemble properties of Random Vector Quantization (RVQ) codebooks for limited-feedback beam- 
forming in multi-input multi-output (MIMO) systems are studied with the metrics of interest being the 
received SNR loss and mutual information loss, both relative to a perfect channel state information (CSI) 
benchmark. The simplest case of unskewed codebooks is studied in the correlated MIMO setting and 
these loss metrics are computed as a function of the number of bits of feedback (B), transmit antenna 
dimension (Nt), and spatial correlation. In particular, it is established that: i) the loss metrics are a 
product of two components - a quantization component and a channel-dependent component; ii) the 
quantization component, which is also common to analysis of channels with independent and identically 
distributed (i.i.d.) fading, decays as B increases at the rate 2^^/(^'~^); iii) the channel-dependent 
component reflects the condition number of the channel. Further, the precise connection between the 
received SNR loss and the squared singular values of the channel is shown to be a Schur-convex 
majorization relationship. Finally, the ensemble properties of skewed codebooks that are generated by 
skewing RVQ codebooks with an appropriately designed fixed skewing matrix are studied. Based on an 
estimate of the loss expression for skewed codebooks, it is established that the optimal skewing matrix 
is critically dependent on the condition numbers of the effective channel (product of the true channel 
and the skewing matrix) and the skewing matrix. 

V. Raghavan is currently with the University of Southern California, Los Angeles, CA 90089, USA. He was with the 
Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA when this work 
was done. V. V. Veeravalli is with the Coordinated Science Laboratory and the Department of Electrical and Computer 
Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Email: vasanthan__raghavan@ieee.org, 
VVV@illinois.edu. * Corresponding author. 

This work was supported in part by the National Science Foundation through grant CNS-0831670. This paper was presented 
in part at the IEEE International Symposium on Information Theory, Seoul, South Korea, 2009. 



I. Introduction 

Optimal signalling to maximize the achievable rate in multi-input multi-output (MIMO) com- 
munication channels requires appropriate adaptation of the number of transmit data-streams in 
response to the SNR, channel correlation, and the channel state information (CSI) available 
at the transmitter and the receiver [1], [2]. On the other hand, an increase in the number of 
transmit data-streams results in a significant increase in the number of radio-frequency (RF) 
link chains and imposes a corresponding increase in complexity and cost [3]. Thus, in many 
later generation (3G/4G and beyond) cellular standards such as WiMAX, 3GPP-LTE, etc., low- 
complexity signalling alternatives are preferred. In particular, beamforming, where the number 
of transmit data-streams is fixed to be one (independent of the SNR, channel correlation or CSI) 
is an attractive choice due to its low-complexity. Beamforming is also preferred when the central 
goal is to maximize the coverage area/range of signalling, over the 60 GHz regime [4] where 
a large number of small antennas can be packed in a fixed area to reap the array gain possible 
with beamforming, and as a mechanism for cross-layer signalling in ad-hoc networks. 
Background: The performance achieved with a beamforming scheme is clearly dependent on the 
quality of CSI available at both the transmitter and the receiver. While perfect CSI at the receiver 
is a reasonable assumption for practical systems, constraints on channel tracking and quality of 
feedback ensure that perfect CSI at the transmitter is an optimistic assumption. Nevertheless, the 
possibility of low-rate reverse link feedback from the receiver to the transmitter has resulted in the 
popularity of limited-feedback systems [5], [6], where B bits of channel quality information are 
fed back to the transmitter. The common method of using the feedback resource in beamforming 
systems is by designing a codebook of 2^ beamforming vectors and feeding back the index of 
the best codeword from the codebook over each coherence period [5], [6]. 

Given a channel correlation profile, the problem of optimal design of 5-bit codebooks is 
ill-posed (in general) and hence, difficult. In the special case of channels with independent and 
identically distributed (i.i.d.) fading, Grassmannian constructions that are designed to maximize 
the minimum distance between beamforming vectors have been proposed in [7] and [8]. The 
intuition behind this proposal is that the dominant right singular vector of an i.i.d. channel 
is isotropically (uniformly) distributed in the space of A^t -dimensional unit-norm beamforming 
vectors where Nt is the number of transmit antennas. Thus, a "good" limited-feedback codebook 
is an efficient quantization of this ambient space. Grassmannian codebooks are obtained via 
algebraic techniques [9]-[ll] and are technically impossible to construct for some {Nt,B)- 
combinations. 

To overcome this difficulty, inspired by the random coding argument. Random Vector Quan- 
tization (RVQ) codebooks have also been proposed in the literature [12]. RVQ codebooks were 
first introduced in the context of signature matrix quantization for Code-Division Multiple Access 
(CDMA) systems in [13], [14]. RVQ codebooks are instantiations of random constructions (in 
contrast to Grassmannian codebooks) and the beamforming vectors are isotropic and i.i.d. over 



the ambient space. Thus, RVQ codebooks can be designed for all (A^^, 5) -combinations and they 
are of low-complexity in terms of design. The intuition behind an RVQ codebook design has 
been extended to the multi-user setting (with i.i.d. fading) in many recent papers [15]-[19]. 

In the general single-user setting where the channel matrix is spatially correlated and the 
dominant right singular vector of the channel has certain preferred directions, Grassmannian 
codebooks are mismatched and are hence, sub-optimal. In fact, in [20, Figs. 6 and 7], [21] 
illustrative examples are given, where Grassmannian codebooks suffer dramatic performance 
losses (on the order of 25 dB in SNR) relative to the perfect CSI benchmark. In these situations, 
more complicated (in terms of design) spherical Vector Quantization (VQ) constructions [22]- 
[24] based on the Lloyd algorithm have been proposed. While VQ codebooks are optima^, it is 
hard to obtain insights on the structure of the optimal codebook. To overcome these difficulties, 
rotation and scaling-based codebooks have been proposed [20], [25]-[29] and shown to result in 
significant improvement in performance over Grassmannian codebooks. The main idea behind 
these constructions is to finely quantize the local neighborhood around the statistically dominant 
eigen-directions and coarsely quantize elsewhere (if B is large enough to afford this possibility). 

Towards the eventual goal of an optimal codebook construction, it is imperative to under- 
stand the performance of existing codebook designs and identify the merits/demerits of existing 
schemes with respect to fundamental limits on performance. In this direction, the performance of 
an ensemble of RVQ codebooks has been studied for i) i.i.d. multi-input single-output (MISO) 
channels [12], [15], [30], ii) correlated MISO channels in the asymptotic-B regime via high 
resolution quantization theory [31], [32], iii) i.i.d. MIMO channels via bounds [33], [34], iv) 
i.i.d. MISO and MIMO channels in the large antenna regime via extreme order statistics [12], 
[35], and v) symbol error rate of limited-feedback beamforming in an i.i.d. MISO setting [36], 
[37]. 

Both exact expressions as well as asymptotic approximations (in B) are available for RVQ 
codebooks for MISO channels in both the i.i.d. and correlated settings and these studies show 

B 

that the rate of decay of the loss metrics is of the order of 2 ^t-i as B increases. However, in the 
MIMO setting, performance analysis is available only in the i.i.d. case in the large antenna regime. 
Further, since reverse link feedback is a valuable resource, the practically relevant regime is when 
B is small and there has been little to no attention in the literature on performance analysis 
relevant to this regime. More importantly, to the best of our knowledge, the performance of non- 
RVQ codebooks has not been studied at all. Thus, it is of interest to understand the ensemble 
properties of RVQ codebooks (as well as codebooks designed based on RVQ codebooks and 
tailored for correlated channels) in the most general correlated setting for practically relevant 
values of B. 



'Technically, VQ codebooks meet the necessary conditions for an optimal codebook construction, but not the sufficient 
condition. Nevertheless, it is widely believed that VQ constructions are optimal. 



Contributions: The main goal of this work is to study the performance of a i?-bit RVQ codebook 
in correlated MIMO channels with the metrics of interest being the received SNR loss (ASNRrx) 
and loss in average mutual information (A/), both relative to a perfect CSI scheme. For this, 
we adopt a program of first averaging the loss metric (with a fixed channel realization) over 
the randomness in the RVQ codebook structure and then, averaging over the randomness in 
the channel. In this direction, we identify the structure of the density function of the weighted- 
norm of isotropically distributed unit-norm vectors. With this information, we obtain closed-form 
expressions (although the results are modulo averaging over channel randomness) for ASNRrx 
and A7. The fundamental contributions of this work are three-fold: i) the loss expressions are 
accurate for small values of B across a large family of channels, ii) they are asymptotically tight 

B 

in B and the rate of decay with B is still 2 ^t-i in correlated MIMO channels, and iii) they 
capture the impact of the channel correlation structure on the performance of RVQ codebooks. 

Further, we also establish a continuous mapping from the space of all majorizable channels to 
performance loss with the RVQ codebook in that channel by showing that ASNR^x is a Schur- 
convex function of the squared singular values of the channel. An important consequence of 
this result is that a channel that is well-conditioned leads to the smallest value for ASNR^x, 
whereas a rank-1 channel leads to the largest value for ASNRrx- As the rank of the channel 
decreases and/or the condition number of the non-trivial singular values of the channel increases, 
performance loss with the RVQ codebook relative to a perfect CSI scheme increases. Intuitively, 
RVQ codebooks are isotropic constructions whereas perfect CSI beamforming corresponds to 
skewing the signal along the dominant right singular vector of the channel. Thus, a channel that 
has an isotropically distributed dominant right singular vector (an i.i.d. channel) is best matched 
for the RVQ codebooks, whereas a channel that has a fixed direction for the dominant right 
singular vector (a rank-1 channel) is poorly matched for RVQ codebooks. This intuition mirrors 
the source-channel matching principle for statistical semiunitary preceding established in one 
of our prior works [21]. Since majorization only results in a partial ordering on the family of 
all channels, we show that a simplified ordering metric to approximately order and compare the 
performance of the RVQ scheme (in all channels) is the dominant squared singular value of the 
channel. 

Recent interest in the limited-feedback literature [25], [26] has been on the design of skewed 
codebooks where a fixed skewing matrix is used to skew an RVQ codebook (or a Grassmannian 
codebook). The skewing matrix biases the isotropic beamforming vectors in the RVQ codebook 
and orients them along its singular vectors. Thus, by a suitable choice of the skewing matrix, 
significant performance improvement can be achieved relative to the RVQ scheme. Despite 
these observations, technical challenges have ensured that the performance analysis of skewed 
codebooks has not been addressed in the literature. In the last part of this paper, we overcome 
this challenge to generalize our characterization of the ensemble properties of RVQ codebooks to 
the case of skewed codebooks. Our result captures the received SNR loss in terms of the skewing 



matrix thus allowing us to obtain insights into the structure of the optimal skewing matrix for 
limited-feedback beamforming. Our study establishes the criticality of the condition numbers of 
the effective channel (which is the product of the true channel matrix and the skewing matrix) 
and the skewing matrix in this question. Building on this insight, we construct a class of skewed 
codebooks that match the left singular vectors of the skewing matrix with the dominant eigen- 
directions of the transmit covariance matrix of the channel. Numerical studies show that these 
skewed codebooks significantly out-perform RVQ codebooks and are better than the codebooks 
proposed in [25], [26]. 

Organization: This paper is organized as follows. In Section |Ill we introduce the limited-feedback 
beamforming setup. In Section [nil we study the received SNR loss with an ensemble of RVQ 
codebooks in the most general (correlated MIMO) setting, whereas in Section |IVl our focus is 
on ordering (comparing) channels with respect to the received SNR loss metric. For this, a partial 
ordering in the form of a majorization result and an approximate complete ordering are presented 
in Sec. HVl In Section |Vl we study the mutual information loss with RVQ codebooks, while 
in Section |VIl we extend the analysis of Sec. Un] to the skewed codebook setting. Concluding 
remarks are provided in Section rvill Proofs of most of the results are relegated to the Appendices. 
Notations: Upper- and lower-case bold symbols are used to denote matrices and vectors, respec- 
tively. The i-ih. element of a vector x is denoted by x(i) and its two-norm is denoted as || ■ ||. 
The Hermitian transpose of a matrix is denoted by (■)^ while the trace and rank operators are 
denoted by Tr(-) and rank(-), respectively. The eigenvalues of an Nt x Nt positive semi-definite 
matrix M are arranged in decreasing order as Ai(M) > • ■ ■ > AjVj(M). Many times, we will find 
it convenient to write the above relationship as Ai > • • ■ > Aa^^ when there is no ambiguity about 
the matrix under consideration. If M is a full-rank matrix, the squared condition number xu is 
defined as /)^^(2iMt) • We loosely say that M is ill-(or well-)conditioned depending on whether 
Xm is (or is not) significantly larger than 1. The indicator function and probability of an event 
are denoted by l(-) and Pr(-) while the expectation operator is denoted as E [■]. The symbols C, 
B, C,, I and diag(-) are reserved for limited-feedback codebooks, number of bits of feedback, 
constants in theoretical statements/results, identity matrix, and a diagonal matrix, respectively. 
The symbols C and M stand for the complex and real fields while IR+ and stand for positive 
real fields of n and 1 dimensions, respectively. The notations f{B) x°° g{B) and the little-oh 
notation f(B) = oiqiB)) as i? — t- oo stand for lim = 1 and lim 4^ = 0. 

II. Beamforming Setup 

We consider a communication system with Nt transmit and Nj. receive antennas where one 
data-stream is used for signalling. The baseband model is given by 

y = v/pHfs + n (1) 

where p is the transmit power constraint, the complex Gaussian input s is i.i.d. with zero mean 
and unit-energy, H is the A^^ x A^rdimensional channel matrix, and n is the A^^-dimensional 



proper complex additive white Gaussian noise. In ©, f is a vector on the complex Grassmann 
manifold Q(Nt,l). That is, f is a A^^ x 1 unit-norm vector representing the equivalence class 
{fe^^ 9 G [0,27r)}. 

The main emphasis in this work is on the impact of the channel matrix on limited-feedback 
performance. For this, we assume that the channel evolves according to a block fading, narrow- 
band model. We further assume a Rayleigh fading (zero mean complex Gaussian) model for 
the channel coefficients. The second-order statistics are described via a general, mathematically 
tractable decomposition of the channel [38]: 

H = U,Hi,dUl (2) 

where Hjnd has independent, but not necessarily identically distributed entries, and Uf and U,. 
are unitary matrices that serve as eigen-bases for the transmit and the receive covariance matrices 
(Et and S,.), respectively. The covariance matrices are defined as 

Si ^ E[HtH] =UiE[Hf„,Hi,d]U| (3) 
^ E[HHt] =U,E[Hi,dHfJU];. (4) 

1 /2 1 /2 

The well-known Kronecker-product correlation model (where H|nd = Ar HjjdA/ with Hjjd 
denoting an i.i.d. channel matrix) and virtual representation (where \Jt and are Fourier 
matrices) are special cases of Q. Readers are referred to [38], [39] for a detailed study of 
channel modeling issues. 

We study the coherent case with perfect CSI at the receiver. With beamforming, both ergodic 
capacity and (uncoded) error probability are captured by the received SNR, defined as, 

SNR,x = p- f^H^Hf. (5) 

When perfect CSI (H = H) is also available at the transmitter, the optimal choice (fopt) of 
beamforming vector on ^(A^^^,!) that maximizes the received SNR is uh, the dominant right 
singular vector of H (which is also the dominant eigenvector of H^^H). In this case, the received 
SNR is given by pAi, where Ai is the dominant eigenvalue of H^H. 

However, perfect CSI is hard to obtain at the transmitter end in practice. Thus, as motivated 
in Sec. HI we assume a i?-bit limited-feedback model for the reverse link. We need the following 
definition to introduce the codebook model. 

Definition 1 (Exchangeable & Isotropic random variables): A family of random variables, 
Xi, - ■ ■ , Xn, is said to be exchangeable if the joint distribution is invariant to the set of permu- 
tations over {1, ■ ■ ■ , n}. That is, 

Pr(Xi, ■ ■ ■ , X„ G 0) = Pr ■ ■ ■ , G e) (6) 

for all permutations 11 = [tti , ■ ■ • , 7r„] and any in the range space of {Xi, ■ ■ ■ A 
family of i.i.d. random variables is exchangeable. Exchangeable random variables are identically 
distributed [40]. 



A random Nt x 1 unit-norm vector f is said to be isotropic if its distribution is invariant to 
pre- and post-multiplication by unitary matrices. That is, 



Pr(f G 0) = Pr(e^'^Uf G 0) 



(7) 



for all Nt X A^^ unitary matrices U and G [0,2n), and in the range space Q{Nt,l). In 
particular, the distribution function of an A^j x 1 isotropic beamforming vector is given as [41] 

'6»G0 



TT 



Nt 



Pr(f G 0) = / 
where 5{-) stands for the Dirac delta operator and 

T(x 



5(f^f - 1) de 



(8) 



(9) 



stands for the Gamma function extended to C (minus its singularities). ■ 
In this work, we assume that an RVQ codebook of B bits, C = {fj, i = 1, ■ ■ ■ , 2^}, is known 
a priori at both the ends. The beamforming vectors in C are isotropic and i.i.d. over Q{Nt, 1). 
The index i* of the codeword that maximizes the received SNR, 



argmax f/H^Hf, 



(10) 



is fed back using B bits. We assume that there is no error or delay in feeding the index back. 

Since an RVQ codebook is by construction random, our interest is in the average properties 
of an ensemble of RVQ codebooks. We desire to compute the following quantities: 



ASNR, 



E 



H 



Ai 



Ai 



A/ ^ E, 



Eh [-^perf — -^lim] 



(11) 



(12) 



The received SNR loss, ASNR^x, is the ensemble average (over the family of RVQ codebooks) 
of the average (over channel randomness) normalized received SNR loss relative to a perfect CSI 
scheme. The quantity A/ is the ensemble average of the loss in average mutual information. 
In (fT2l) . Jperf and Jnm denote the mutual informatioij^ achievable with channel realization H = H 
with perfect CSI and limited-feedback using the feedback metric in (fTOl ). respectively: 



where Ai > 



Jperf = log (1 + p ■ Ai) 

/iim = log(l + p-maxf/HtHf,^ 
> Aatj are the eigenvalues of H^H in decreasing order. 



(13) 
(14) 



^All logarithms are to base 2, unless specified otherwise. 



III. Received SNR Loss 

The goal of this section is to produce a tractable characterization of ASNRrx as defined in (fTTI) . 
For this, note that a simple Fubini argument implies that we can change the order of expectation 
in (fTTI) (and (fT2l)). Thus, conditioned on a particular realization of the channel H = H, we seek 
to compute the following average: 

Ai -maxif/HtHfi 



Ai 



Ai. 



(15) 



We then average Ai over H to obtain ASNRr 



A. Equivalent Characterization of Ai 
Lemma 1: 

• If {fj} are isotropic on Q{Nt, 1), the family of random variables 

%{k)\',k = l,--- ,nA (16) 



is exchangeable for any fixed i. Recall that fi{k) is the k-th element of fj. 
• Further, with a given fixed channel realization H = H, the family of random variables 
{xj, i = 1, - ■ ■ , 2^} where Xj = f/H^Hfj is i.i.d. over its range [Aat^, Ai]. 
Proof: See Appendix \M ■ 
If Xj are i.i.d. positive random variables, for any x > 0, we have 



Pr ( max Xj < x ) = ( Pr (xj < x) ) (17) 

i=l,--- ,m 



m 



for any choice of m. Using this fact in conjunction with Lemma [H we have 



maxf/H^Hfi - A^v^ = f Pr (^max f/H^Hf, > da; 



Ai — Aat, 



(18) 



' Pr (^maxf/H^Hfi < x) dx (19) 



where (fTSl) follows from a routine Fubini argument. Hence, upon rearrangement, we have 



Ai = i- . ( Ai - Ec 



maxtfH^Hf, 



(20) 



= — ■ / (Pr(f^HtHf <x) ) dx (21) 

= — ■ / (Pr (f^Af < x) ) dx (22) 

where the eigen-decomposition of H^H is given as H^H = U AU^ with A = diag ([Ai, ■ ■ ■ , AatJ), 
f is an isotropically distributed vector in Q{Nt, 1) in (|2TI) and (|22|) . and m is particularized to 
m = 2^ in (EB and (|22l). 



B. Distribution Function of the Weighted-Norm of Unit-Norm Vectors 

From the preceding discussion, we conclude that computation of ASNRrx requires the distri- 
bution function of f^f, which is a weighted-norm (with weights given by the diagonal entries 
of A) of isotropically distributed beamforming vectors on G{Nt, 1). We start by characterizing 
the relevant distribution functions completely in the special cases of Nt = 2, 3. (A study of the 
general Nt case follows.) 

Lemma 2: Let f be an isotropically distributed unit-norm vector on Q{Nt, 1) and let A = 
diag([Ai,--- ,AArJ) be some fixed diagonal matrix with Ai > ■ ■ ■ > Aat^ > 0. The cumula- 
tive distribution function (CDF) F{x) of f^Af over the non-trivial support region (the interval 
[Aat^ Ai]) is as follows: 



Nt=2 



F{x) 



^^^^ = ift' A2<x<Ai, (23) 

(Ai-A3)(A2-A3)' ^3 S X S A2 ^^^^ 

Nt='i 1 T?( \ \ J- (^'-''>2)(2Ai-a'-A2) a ^ ™ ^ \ 

I ^ ^^2) + (Ai-A2)(Ai-A3) ' ^2<X< Ai. 

While the behavior of F{x) is too cumbersome to be stated in the general Nt case, its behavior 
over the segment [A2, Ai] is simple: 

F{x) = 1 - ^ , A2 < X < Ai. (25) 

nf4(Ai-A,)' - - 

Proof: See Appendix |Bl ■ 
A simple verification shows that F{\i) = 1 in all the cases, as expected. The distribution 
functions are derived in Appendix |B] by computing the volume of intersection of a complex 
ellipsoid with a unit-radius complex sphere. This computation mirrors and generalizes the com- 
putation in [8] where the volume of a spherical cap (intersection of a plane with a unit-radius 
complex sphere) is obtained in closed-form. While this generalization is hard to geometrically 
visualize beyond the Nt = 2 case, it can be seen that the trend over [A2, Ai] shows the same 
behavior as the distribution function in [8]. 

Fig. [T] illustrates the trends of the CDF by plotting the goodness-of-fit between the theoretical 
expressions in Lemma [2] and the CDF estimated via Monte Carlo methods. Three cases are 
considered: a) A = diag([2 1]) for iVt = 2, b) A = diag([3 2 1]) for A^^t = 3, and c) A = 
diag([4 3 2 1]) for Nt = 4. 



C. Main Result 

The following theorem captures the performance loss with RVQ codebooks. 




Fig. 1. CDF of weighted-norm of isotropically distributed unit-norm vectors. 



Theorem 1: In the MIMO setting, in the special cases of A^^^ = 2 and 3, we have 



Ai 



Nt=2 



E 

k=l 



A,- 



Ao — \? 



(26) 



1 - 



A.-^ \ / A2 — A3 



Ai/ V Ai — A3 



m—k 



2'^m{m — 1) ■ • • (m — + 1) 



Ai-Ag/ (2m-l)(2m-3)---(2m-2A; + l) 



(27) 



where m = 2^, = and ^3 = . In the general {Nt > 4) case, we have 



Ai ^ An, 



A 



1 ^2 , 



1 



2^^ ■ m{m — 1) ■ ■ ■ (m — k + 1) 
^ (2m + p - 1) ■ ■ ■ (2m + p - 2A; + 1) 



Nt 



Nt 



P 



m{Nt-l) + V ^ Nt 
Further, we have the following bounds: 

< 

where 



, and 

j=2 ^ J 



Al — Ai appx ^ 

A^ 



A A2 — AiVt 



Al Al. 



Al^ appx) 

(28) 
(29) 

(30) 
(31) 



appx 



We will show subsequently (see (l38l)- (|40l) ) that for any H. That is, Ai appx is a tight 

approximation to Ai with 

Al = Ai, appx + 0(Ai, appx) (32) 

as i? — oo. 

Proof: Since F{x) is monotonia, the dominant term of the integral in (f22)) in the general 
Nt case is over the interval [A2, Ai]. Computation of this dominant term results in the statement 
of the theorem. See Appendix O for details. ■ 
In the special cases where H is a MISO channel {Nr = 1) or H is effectively a MISO channel 
(rank(H^H) = 1), Ai can be computed in closed-form [30, Cor. 1], [15] as 



Ai = E, 



c 



min sin^( 

. i 



2''/3 ( 2^, ] (33) 



with 9i denoting the angle between fj and uh (the dominant right singular vector of H) and 

P{x,y)= fe-\l-ty~Ht (34) 



is the Beta function. The MISO setting can be obtained as a limiting case of Theorem [T] with 
A2 = • • ■ = Atv, ^ 0. 



D. Asymptotics of B 

Theorem [T] separates (to first order) the impact of the channel from that of the RVQ codebook 
(number of bits B). Nevertheless, the expressions provided are too complicated to obtain simple 
heuristic insights. 

To overcome this difficulty, we now provide simplifications for A 1 as i? —t- 00. In the Nt = 2 
setting, the expression for Ai is already simple. Thus, we start with the case of A^^ = 3 and then 
study the Nt> A case. 

Proposition 1: In the Nt = 3 case, the dominant term of Ai behaves as 



Ai 



TT 



2^/2+1 

as i? — 7- 00. Similarly, in the Nt > A case, we have 



Ai 



N 



1 



A2 
Ai 



1 + 



2(Ai-A3 



D 



+ (2-^/2) 



(l-D)iNt-l] 



+0 2 ^t- 



(35) 



(36) 



where k 



Nt-l 

Proof: See Appendix |Dl 



^1, asymp 

and D is as in (|29l). 



From Prop. [Has well as (l33l) . in the special case where rank(H^H) = 1, we have 

B 

2 



which is also established in [15], [30]. For the rate of convergence of eb in (|3T]) as 5 — > oo, 
note that 

^2 - ^Nt\ , 



log (es) = log y-^^^j + log(D) + log J (38) 

^ -2^1ogfi-V^(l) (39) 



Nt-l "\D, 
"x~-2-log(l) (40) 

where (a) follows from Prop. \T\ and the C(l) factor is a constant for a given H. 

We now provide a numerical study to illustrate the theoretical results presented in Theorem [H 
and to provide an idea as to how useful the asymptotic approximations are in the non-asymptotic 
regime. Three channel realizations of size A^^ x Nt with Nt = Nr = {2,3,4} are generated 
randomly and then held constant and the performance is averaged over 1000 RVQ codebooks. 
The three channels are such that the squared singular values are: 1) [2 1], 2) [3 2 1], and 3) 
[4 3 2 1], respectively. Fig. [2] shows the match between the theoretical expressions in Theorem[il 
the asymptotic approximations in Prop. [I] and Monte Carlo estimates of Ai. We see that the 
asymptotic approximations are close even for small values of B {B > 2), which is useful from a 
practically motivated limited-feedback perspective. While we have considered the goodness-of-fit 
of the three expressions with a specific channel realization in Fig. [2l the goodness-of-fit of the 
three expressions across a large family of channels is studied next. 

IV. Ordering Channels Based on RVQ Performance 

The focus of this section is to develop a basis (or a metric) to order a family of channels 
such that the RVQ performance over a particular channel can be compared with performance 
over another channel. In particular, the interest is on those conditions on channels Hi and H2 
that are critical to ensure that 



Ai 



< Ai 

Hi 



(41) 

H2 



Let A = [Ai, ■ ■ ■ , AatJ and /x = [/ii, ■ ■ ■ , fJ^Nt] denote the vectors of squared singular values 
of Hi and H2 with Ai > • ■ ■ > X^t > and /^i > ■ ■ ■ > fiNt > 0. In the special case of Nt = 2, 
Theorem [T] shows that 



< Ai 

Hi 



Ai . /ii 



, (42) 

A2 /i2 



Ai 

With A and /x normalized such that 

Ai + A2 = Pc = /ii + /i2, (43) 



0.4 




012345678 

B 



Fig. 2. Goodness-of-fit of different estimates of Ai as a function of B. 



(|42)) is equivalent to Ai < /ii or A2 > /i2- To make this connection more precise in the general 
Nt case, we assume that the channels are normalized such that 

Nt Nt 

Y,\^ = Tr(HlHi) = Tr(HtH2) = J^l^i = Pc, (44) 

i=l i=l 

where pc denotes the channel power. This normalization is commonly used in multi-antenna 
channel measurement studies to ensure that the channel power stays fixed, independent of the 
distance between the transmitter and the receiver and the energy of the scattering phenomena. 
See [39] for a discussion of channel power normalization issues. 

We also define the notions of a majorization ordering and a Schur-convex function [42]. 

Definition 2 (Schur-convex function): We say that A is majorized by /x (denoted as A -< /x) 

if 

k k 

5^A, < J]/.,, l<k<Nt, (45) 

1=1 i=l 

with equality for k = Nt. With A and /x denoting the vectors of squared singular values of Hi 
and H2, respectively, equality in (l45l) for A; = A^^^ is a consequence of (l44l) . 

Let /(•) be a function such that / : 1— )• R. We say that /(■) is Schur-convex on if 

x^y^/(x)</(y). (46) 

The function /(■) is Schur-concave if — /(■) is Schur-convex. ■ 
With this background, the main result of this section is as follows. 



Theorem 2: The normalized received SNR loss is a Schur-convex function of the squared 
singular values of the channel. That is, if A and /x denote the vectors of squared singular values 
of Hi and H2 with A ^ /x, we have 



Ai 



< Ai 



(47) 



Proof: See Appendix |El ■ 
Some comments are in order at this stage. 

1) Note that it is difficult to draw the conclusion of Theorem [2] from either the exact expression 
in the iVf = 3 case or the approximate/asymptotic expressions of Sec. |llll Theorem |2] 
provides a continuous ordering on the space of all possible (majorizable) channels with 
respect to RVQ performance. Similar results exploiting majorization theory have been 
obtained for the ergodic capacity of MISO systems [43], outage probability of MISO 
systems, error performance of orthogonal space-time block codes, performance analysis of 
precoding in MIMO systems [44], performance of CDMA systems, etc., (see [21], [44], 
[45] for details). Theorem [21 leads us to the following conclusion. 

Corollary 1: Any channel H with the vector of squared singular values denoted by A 
satisfies 



El. 



^ A^ [pc, O,--- ,0] 



resulting in 



Ai 



Pc 
'Nt 



< Ai 



[pc,o,-,o] 



(48) 



(49) 



In other words, the best channel with respect to RVQ performance is well-conditioned 

^ j'|_|t|-j'\ 

with squared condition number xh = x^\h^\{) equal to 1, whereas the worst channel is a 
rank-1 channel. ■ 
This conclusion fits within the theme of source-channel matching for signalling design in 
single-user MIMO systems, established in [21]: the best channel with respect to a specific 
signalling scheme is the channel that optimizes an appropriately defined matching metric 
for that scheme. For the beamforming scheme with Ai as the chosen metric and given that 
an RVQ codebook has isotropic vectors (equally likely to beamform along any direction), 
the channel that is best-suited to this scheme should also have dominant right singular 
vectors that are isotropic in Q{Nt, 1). This choice leads us to the i.i.d. channel matrix [7], 
[10]. Similarly, a rank-1 channel with a fixed right singular vector is ill-suited to an RVQ 
codebook that is "wasteful" by beamforming isotropically in Q{Nt, 1). 
2) We now provide two specific examples to illustrate the dependence of A 1 on the rank of 
the channel and the condition number. 



Corollary 2: Note that 



-< < [p„ 0, ■ ■ ■ , 0] . 



(50) 



Pc/r , 

r times JVt-rtimes_ 

Thus, Ai increases as the rank r of the channel decreases. 

Further, within the family of channels with the same rank r, Ai increases as the r non-zero 
squared singular values become more ill-conditioned. ■ 
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Fig. 3. Received SNR loss for channels ordered via a majorization relationship as a function of B. 



Fig.[3]plots Ai as a function of B across a family of 150 channels that can be continuously 
majorized as follows. With Nt = Nr = 4: and pc set arbitrarily to 1 (without loss in 
generality), the squared singular values for the i-th channel are given as 

\i = [1 - Xi, Xi/3, Xi/3, Xi/3] (51) 

where Xi increases from 0.01 to 0.75 in steps of 0.005. It can be seen that for any i 

A, ^ A^-, 1 < J < ^ - 1, (52) 

and the channel becomes more well-conditioned as i increases. On the other hand, Ai 
continuously decreases, thus illustrating Theorem [2l 
3) Majorization provides an ordering metric to compare channels with respect to RVQ perfor- 
mance. However, it is important to note that the metric only induces a partial ordering on 
the family of channels since there exist channels that cannot be compared via a majorization 
relationship. A simplified, albeit approximate, channel ordering metric that reflects the 
condition number of the channel and allows an approximate complete ordering of channels 



is Ai. However, numerical results illustrating the efficacy of this metric are not provided 
here for the sake of brevity. 
In general, we would like to study the behavior of ASNRrx = Eh [Ai]. 
Proposition 2: In the special case where {Nt, Nr} — ?■ oo with ^ — 0, the singular values of 
H converge (harden) [21], [46] as follows: 

A,(HtH) ^ A, (E [HtH] ) = A, (S^) , z = 1, ■ ■ ■ , iV^. (53) 

Hence, we have 



©1 ^ 



V2 

with the approximation holding up to a multiplicative constant that depends on the antenna 
dimensions and B. ■ 
Note that Vi is minimized when Ai(Si) A2(Sf) whereas V2 is minimized when A2(St) ~ 
• ■ ■ ^ AArj(St) ^ 0. But V1V2 is minimized when is well-conditioned. Apart from this case, 
estimating ASNRrx appears to be difficult in general. We therefore resort to numerical studies 
to study trends of ASNRrx. 

Following the discussion in the context of channel ordering, we expect that as the rank of 
St increases and as a consequence, the condition number of the channel decreases on average, 
the performance loss with RVQ should decrease. Fig. Ua) illustrates this heuristic with four 
channels generated according to the Kronecker-product correlation model in Q. The eigenvalues 
of S,. of the four channels are fixed as 1.6 x [4 3 2 l] where the factor of 1.6 means that 
Tr(S^) = NtNr = 16. The eigenvalues of are as follows: 1) [16 0], 2) [8 8 0], 3) 
[16/3 16/3 16/3 0], 4) [4 4 4 4] ensuring that Tr(St) = 16 in all the four cases. 

V. Mutual Information Loss 
Following the same development as in Sec. [nil we can write AI as 



A/ = Eh[A2], ^2 = y (^Pr(x<x)j dx (55) 

where x = log (l + p ■ ftHtHf), m = 2^, 

L = log(l + pA^J, and [/ = log (1 + pAi) . (56) 

It is easy to see that 

p f- (P^(f'HtHf<.))" ^^^^ 

1 -I- DT. 



loge(2) 7a^, 1+pX 



(Pr (ftAf <x))"' 

^ ^ - ^^-dx. (58) 




In contrast to the development in Sec. [Ill] where the integrand is monotonically increasing, 
the integrand in (l58l) is not necessarily monotonic as it is a ratio of two increasing functions. 
Nevertheless, we can trivially capture the trend of A2 as illustrated next. 
Corollary 3: The following asymptotic trend holds for A2: 

D 



1 + 



+ o 2 ^t-i 



(59) 



log,(2)(iV,-l) 
where k and D are as in Theorem \\\ 

Proof: A trivial bound for l+px in (|58]) over the interval [A2, Ai] implies that the dominant 
term of A2 (denoted as A2,appx) can be bounded as 

P / A2,appx ■ loge(2) ^ p 



< 



< 



1 + pAi J^^ (Pr (f tAf < x)r dx 1 + pA2 ■ 
A consequence of the computation in Theorem \T\ is that 



A2 < A2,appx ^ A2 



with 



A2 



A, 



P 



l0ge(2) 
P 

l0ge(2) 



■An, 
■An, 



Ai — A2 
1 + pAi 

Ai — A2 



Ci 
Ci 



where 



Ci=D' 



2 • m(m — 11 



I + PA2, 
■ ■ {m — k + 1) 



^ (2m + p - 1) ■ ■ ■ (2m + p - 2k + 1 



-D 



m—k 



(60) 

(61) 

(62) 
(63) 

(64) 



and we have reused the notations {Aiy^,p, D) from Theorem [U It is straight-forward to see that 



A, 



K ■ 2 ^t-^ 



K ■ 2 'Vt-i 



1 + 



1 + 



D 



(l-D)(iV,-l) 
D 



+ o 2 ^t-i 



;i-D)(iv,-i) 



+ o 2 ^t- 



(65) 



(66) 



and thus we have (|59l ). ■ 
While Cor. [3] captures the asymptotic trend of A2 via trivial bounding, it is not tight when 
Ai ^ A2. In these situations, it is useful to obtain a tighter estimate for A2. This is addressed 
next. 

Theorem 3: In the Nt = 2 case, we have 



1 



l0ge(2) ■ Z^' 



logJl + z) 



E 

t=i 



t 



(67) 



where m = 2^ and z = ^^r^^—r^- In the general Nt case, we have the following approximations: 



A, 



1+PX2 

loge(2) 



(iV,-l)(l + pAi) ^ 



■E 



X 



m—k 



2^ ■ m^m — 1) ■ ■ ■ {m — k + 1) ■ D 
^ (2m + Pi - 1) ■ ■ ■ (2m + - 2A; + 1) 



A 



2, appx; 



1 



7 



pA 



Nt-1 



1 + pAi' 



and Pi 



Further, we have 



where 



< 



'B 



HAi-A, 

A2 — A2, appx ^ , 

A^ -"^ 

p{\2 - \Nt) 

(l + pA^J-log,(2) A2, appx 



2(i + 1) 



1. 



B) 



-2^ log 



Thus and 



Ao = A 



2, appx 



o(A2, 



appx 7 



(68) 
(69) 

(70) 

(71) 
(72) 

(73) 



as S — )■ 00. 

Proof: See Appendix IB 



An alternate expansion for A2 is also presented in Appendix IB This expansion corresponds to 
an alternate form of the integrand in (|58] ) and is captured by a series where the signs of alternate 
terms change. In this spirit, the alternate expansion generalizes (|67] ). From a numerical stand- 
point, this oscillatory nature is unattractive due to non-convergence of the series and (|68l) -(l69l) 
overcomes this problem. We now study the asymptotic trends of A2. 

Proposition 3: In the Nt = 2 case, depending on the relationship between p, Ai and A2, two 
possibilities arise as B increases. We have 



Ac 



= (2) ' 
(^-1) 



log,{2) 2-2:logj2:)-(m-l)' 

In the general Nt case, as B —t- 00, we have 



A, 



2 



p(Ai - A2) 

log,(2)(iV,-l) ■ 1 + pAi 



z < 1 

Z>1. 

D 



(74) 



D 



+0 2 



(75) 



^2, asymp 



where k = ^{jj^) and D is as in (|29l) . 

Proof: See Appendix O ■ 
We now illustrate the above theoretical results in Fig. [Sj^a) and (b) where we plot the instan- 
taneous mutual information loss both theoretically and via Monte Carlo averaging. The squared 
singular values of the three channels are (as before): 1) [2 1] for Nt = 2, 2) [3 2 1] for Nt = 3, 
and 3) [4 3 2 1] for A^^ = 4, respectively. Asymptotic and approximate expressions are tight for 
small B values as long as p is not too large. On the other hand. Fig. |3b) illustrates the trend 
of A/ as a function of the rank of E^. The channel data used to generate Fig. S^b) is the same 
as that used for generating Fig. Ua) (see the discussion there). 



VI. Skewed Codebooks for Correlated Channels 

From (I22I) and (|55l) . the asymptotic optimality of RVQ codebooks in the correlated case is 
obvious. That is, ASNRrx and A/ (respectively) as i? — )■ 00, independent of the 
channel correlation profile, since a probability term in the integrand is raised to the power of 
m = 2^ — > 00. Nevertheless, this does not mean that RVQ codebooks are optimal for any 
finite value of B in the correlated case. While the ensemble averaging of RVQ codebooks is 
necessary to make constructive statements about their performance, certain fixed constructions 
may significantly outperform other constructions for small values of i?. In fact, it is well-known 
that codebooks constructed by exploiting the channel correlation structure clearly outperform 
Grassmannian codebooks (and thus, in principle, RVQ codebooks) for small B and that the 
condition number of the channel determines the performance of these codebooks [20], [25]- 
[29]. 




To improve over the RVQ performance for finite values of B, we consider a codebook Csk 
where C = {fj, ? = 1, ■ ■ ■ , 2^} is skewed by a fixed Nt x A^^ matrix A and then normalized as 
follows: 



C. 



sk 



|Af,-l 



t = 1, 



The relative received SNR los^ with Csk is given as 

1 fM^HtHAf, 



Ai sk = Er, 



AilHtH)'"^.^^ ftAtAf. 



and the broad goal is to design Aopt where 



Aopt = argmmEn [Ai,sk] 



(76) 



(77) 



(78) 



A. Equivalent Characterization o/ Ai sk 

In this direction, a simple transformation argument 



f|v G Q{Nt, 1) allows us to check 



that 



ft AtutHAf 

A^.(HtH)<ii|^i^<Ai(HtH),z = l,...,2^ 



(79) 



^We will henceforth denote the explicit dependence of H^H on Ai and use the notation Ai(H^H) to distinguish it from the 
eigenvalues of A'^ A and A'l^HtHA. 



Further, along the lines of Lemma [U it can also be checked that 
are i.i.d. Hence, as in Sec. Hill we can rewrite Ai sk as 

/•Ai(HtH) 

Ai,sk-Ai(HtH)= / 

JA^r,(HtH 



ftAtHtHAfi 
f/AtAfi 



Pr 



ftAtHtHAf 
ftAtAf 



< x\i^i = 1 



dx 



(80) 



\iVt(HtH) 

where f is an isotropically distributed unit- norm random vector and m = 2^. From (|80l) . it 
is clear that quantifying Ai sk is dependent on knowledge of the distribution function of the 
ratio of weighted-norm of isotropically distributed unit-norm vectors. This is a hard problem, in 
general, unless there is some underlying structure to A that can be exploited. Of course, imposing 
structure on A cannot help solve for (l78l) . an unconstrained optimization problem. 



B. Main Result 

We overcome this technical difficulty by first studying the special case of A^^^ = 2. We then 
expand the intuition obtained from the Nt = 2 case to the more general case. 

Proposition 4: In the special case of Nt = 2, Ai sk can be bounded by Ai^sk, which behaves 
as i? — 7- oo as: 

- 2^ A Ai(HtH)-Ai(AAt)J- ^^^^ 
Proof: The first step to prove the proposition is to establish a simplified version of (|80l) . 
The second step deals with bounding Ai sk by an appropriate Ai sk and capturing its asymptotic 
trend. See Appendix iHl for details. ■ 
Note that in the case of no skewing (A = I), (ISTI) reduces to the result in Theorem [TJ 

Theorem 4: For the Nt = 3 case, the dominant term of an upper bound to Ai sk behaves as: 



Ai,sk-Ai(HtH) < Ai,sk-Ai(HtH) 

v/^ /Ai(AtHtHA) 
Ai(AtA) 



(82) 



2-- 



ajh^h: 



D. 



sk 



\ / Ai(HtH)Ai(AAt) \ 
(l-Z}sk)(iVt-l); V Ai(AtHtHA) J 



where 



sk 



1 _ ^ Ai(AtHtHA) - Ajv,(HtH) ■ Ai(AtA) 
i=2 



Ai(AtHtHA) - Aj(AtHtHA) 



(2-^/2) (83) 



(84) 



and for some monotonically increasing function G(-), the structure of which is provided in (|263|) - 
(12641) in Appendix HI 



If Nt > 4, the asymptotic behavior (in B) of Ai sk is as follows: 
Ai,sk<Ai,sk (85) 
(^^ Ak ^ ( Ai(AtHtHA) AA.,(HtH) 



Nt-l V (l-I^sk)(iVt-l); VAi(AtA)-Ai(HtH) Ai(HtH 



V 

^1, sk, asymp 



+ o(^2"^) (86) 

where k = r(;^Y^) and is as in (|84l ). 

Proof: See Appendix HI ■ 

C. Insights on Aopt 

While solving for Aopt in (l78l) appears to be difficult, we now develop some insights on its 
structure. 

1) From (dl), recall that the system model (conditioned on H = H) with beamforming vector 
of index i from Csk reduces to 



^ HAfiS + n. (87) 



f/AtAf, 

By treating HA as the ejfective channel in (|87l) . an application of Theorem [21 suggests that 
Ai sk is minimized if HA is well-conditioned. However, this argument is rigorous only if 
f/A^^Afj can be treated as a constant for all i so that A does not arbitrarily scale the power 
of the effective channel. 

2) In the special case of Nt = 2, from Lemma [21 since is uniformly distributed over 
the interval [A2(A^A), Ai(A^A)], well-conditioning of A is necessary to ensure that f/A^Afj 
is approximately constant for all fj. Thus, there exists a tension between the two objectives 
(of well-conditioning of HA and A) in deciding the appropriate choice of A. Prop. HI makes 
this intuition more concrete. From (ISTl) . it is clear that A should be chosen such that Ci, 
defined as, 

. Ai(AAt) 

- A,(AtHtHA) ^^^^ 
is minimized. But minimizing Ci is equivalent to minimizing the two squared condition 
numbers (of HA and A), xha = A2(AtHtHA) = x^j^XTJ- While a particular choice of 

A may make HA more well-conditioned than H, this choice may not necessarily correspond 
to a well-conditioned A (and vice versa). 

3) A further upper bound to the asymptotic trend in ([831) of Theorem HI (up to a multiplicative 
constant) in the A^^ = 3 case is 

^^ A2(AtHtHA)\ A3(AtHtHA)\ 
Oil) (,l-M(AtHtHA)j-(,l-A.(AtHtHA)j ^ Ai(HtH) ■ Ai(AAt) ^ 

- 1 A,(AAt)A3(HtH) Ai(AtHtHA) J- ^ ^ 



The goal of minimizing the term in (1891) is equivalent to the goals of jointly minimizing 

, Ai(AtHtHA) , Ai(AAt) 

= A3(AtHtHA) = MAtHtHA) • ^^^^ 

4) Consider the A^^^ > 4 case. Recasting TheoremlH it can be seen that Ai sk, asymp is minimized 
if 

(iV, - 2 + £4) ■ (91) 

is also minimized, where 

. ATI Ai(AtHtHA)-A,(AtHtHA) 

Al Ai(AtHtHA) - Ajv,(HtH) ■ Ai(AtA) ^ ^ 

^ A Ai(AtHtHA)-A^,(HtH)-Ai(AtA) 

- A^pA^ ■ ^^^^ 



In the large- A^t regime, observing that 

Ai(AtHtHA) - Aj(AtHtHA) 



> 1 (94) 



Ai(AtHtHA) - A7v,(HtH) ■ Ai(AtA) 
for all j, we have 

Nt-2<^ U. (95) 

Thus, the dominant term of (|9TI) in this regime is £4 • £5. 

5) Combining and unifying the above discussion, a (heuristically) "good" candidate for A 
should be such that the two metrics (Afi and M2), defined as, 

^ ' - A.(XT(HtH) ^^^^ 

' A^,(AtHtHA) ^ ^ 

are minimized jointly, if possible. 

6) Conditioned on H = H, note that M\ G [0, 1] whereas M2 G [l,oo). The smallest value 
(of 0) for Ml is achieved with an A such that the eigenvectors of AA^ coincide with those 
of H^^H in the same order. With this choice, M2 satisfies 

M2 = Xh ■ Xa. (98) 

On the other hand, the smallest value (of 1) for M2 is achieved with A = (H^H) . With 
this choice, M\ satisfies 

Ml = 1 - — . (99) 
Xh 

In other words, while Mi is minimized by a choice of A whose left singular vectors 
match the right singular vectors of the channel, M2 is minimized by a choice that inverts 
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Fig. 6. Performance of skewed codebooks as a function of B with a) an ill-conditioned channel realization, and b) a well- 
conditioned channel realization. Average performance of different families of skewed codebooks as a function of c) the parameter 
defining the skewing matrix classes, and d) B. 



(or zeroforces) the channel. Thus, optimization over A is a combination of these two 
conflicting objectives in an appropriate sense. However, which of these objectives is more 
important than the other is not clear. 
7) We now address this question via numerical studies for a fixed channel realization. In 
the first example, we consider an ill-conditioned channel with Nt = Nr = 4: and squared 
singular values [4 3 2 1]. We numerically search over A to minimize 

Ceia) = a ■ Ml + (1 - a) ■ Ma (100) 

for an appropriate choice of a E [0, 1] that determines the weights between the two 



objectives in (I96l)-(|97]). The extreme cases of minimizing Mi (or M2) alone can be 
obtained by setting a = 1 (or a = 0) in (llOOl ). In Fig. [6ta), we plot the performance 
of the skewed codebooks (as a function of B) with A designed to minimize Ce^a) 
for the following five choices of a: i) a = 0, ii) a = 0.25, iii) a = 0.5, iv) a = 
0.75, and v) a = 1. The performance of the RVQ codebook (A = I) is also plotted. 
Fig. [6ta) shows that the goal of minimizing Mi is more important than that of minimizing 
M2 and the skewed codebook designed with this objective significantly out-performs the 
RVQ codebook (without skewing). In Fig. [6tb), we consider the performance of skewed 
codebooks designed for the same five choices of a (as above) in a well-conditioned 
channel with squared singular values given by [1.6 1.4 1.2 1]. As in Fig. ^a), we see 
that minimizing Mi {a ~ 1) is the more relevant objective in terms of limited-feedback 
performance. Figs. [6ta) and (b) also show that the performance with a poorly designed 
skewing matrix (e.g., a ~ 0) can be significantly poorer than the RVQ performance. 
8) With ASNRrx = E [Ai s^] as the new metric, the previous study motivates the following 
family of matrices (parameterized by the weight a G [0, 1]) for the design of skewed 
codebooks: 

Ai,^ = argmin E [C(i{a)] (101) 

= argmm a ■ E [Mi] + (1 - a) ■ E [M2] . (102) 

While the family of skewing matrices within the argument in (I101I) - (I102I) is well-defined, 
a closed-form expression is hard to obtain for Ai q. 

To overcome this difficulty, consider the regime where {Nt, Nr} 00 with ^ 0. Using 
the channel hardening principle in this regime [21], [46], the eigenvectors of AA^ for the 
choice of A that minimizes E [Mi] can be heuristically replaced with the eigenvectors of 
E [H1^H] = S(. A suitable candidate^ for such an A is 

A={i:tY = Vt{AtYvl (103) 

for some choice of (3 satisfying /3 > 0. Similarly, the choice of A that minimizes E [M2] 
can be heuristically replaced by 

A= (Si)"^ =Ut(Ai)-^Ul. (104) 

We interpolate the two statistics-dependent candidates in (11031) and (|104|) to obtain the 
following family of matrices for skewing: 

A2,,,/3 = a ■ {lltY + (1 - a) ■ (Si)"^ (105) 
= Ui (aiA^y + (1 - a)(At)-^) Uj (106) 

Note that since the eigenvectors of AA^ have to be in the same order as (the order of) the eigenvectors of H^H to minimize 
, this constraint can only be ensured by setting /3 > in l ll03b . 



for some a G [0, 1] and (3 > 0. Note that the right-hand side of (|106l) can also be written 
as 

A2,a,(5 = Uth{At)Vl = h{'Et) (107) 

for an appropriate choice of the matrix function h(-). In this sense, (11071) generalizes the 
skewing matrix proposed in [25] (which can be obtained by setting a = 1, /3 = |) and the 
matrix proposed in [26] (which can be obtained by setting a = 1 = (3). 
9) We now numerically study the ASNRrx performance of codebooks obtained by skewing 
an RVQ codebook with the two families: {Ai q,} and {/\2,a,i3}- We consider a Kronecker- 
product correlated channel with = 1.6 x diag([4 3 2 1]) and S.,. = diag([7 5 3 1]). 
Note that the channel power is normalized as Tr(S() = Tr(Sr) = 16. 
In the first study, we plot ASNRrx as a function of a for the {A2 q,,^} family for different 
values of (3 and B in Fig. Oc). For all the {(3, B} combinations considered, the smallest 
value of ASNRrx is achieved as a — )■ 1, thereby justifying the following study where 
attention is restricted to the case of a = 1 from the {A2,a /j} family. 
10) In the second study, a numerical search over A is performed with the objective of mini- 
mizing: i) E[Mi], ii) E[Mi] +E[M2], and iii) E[M2], corresponding to three choices from 
{Ai,Q,}: i) Ai^Q,=i, ii) Ai^q,=o.5, and iii) Ai_q,=o, respectively. Motivated by the study in 
Fig. [6tc), four other skewing matrices from the {A2,q,=i,^} family are also considered: 
iv) A2,a=i,/3=o.5, v) A2,a=i,/3=i, vi) A2,a=i,/3=i.5, and vii) A2,a=i,i3=2- Note that as stated 
previously, iv) and v) correspond to the skewing matrix choices proposed in [25] and [26], 
respectively. 

Fig. [6td) plots ASNRrx (as a function of B) for these seven skewed codebooks as well 
as the RVQ codebook and we see that Ai results in better performance over RVQ 
codebooks for small B values (B < A). However, as B increases, the average performance 
with this choice of skewing matrix deteriorates over an RVQ codebook. On the other hand, 
both Ai_a=o.5 as well as Ai q,=o result in poorer performance relative to the RVQ scheme 
thus confirming the importance of Mi over M2 in skewing matrix optimization. We also 
see that the {/\2,a=i,i3} family results in improved performance over the {Ai q,} family as 
well as RVQ codebooks. Further, the performance with skewing matrices for values of (3 
satisfying /3 > 1 from the {^2^0=1,^} family is better than that achieved with the choices 
(3 = 0.5 and (3 = 1. 

In general, we observe that for fixed (3 values, as a increases, performance gets better for 
any B, with the performance becoming independent of a for large values of (3. For fixed 
a, large (3 is seen to be better for small B values (5^0 — 3) whereas /3 = 1 is robust for 
large B values (B ^ 7 — 8). Similar behavior is observed with other choices of transmit and 
receive covariance matrices furnishing evidence to the observations in the literature that 
appropriately designed skewed codebooks can significantly out-perform RVQ codebooks 
over correlated channels. 



VII. Conclusion 

Limited-feedback communications has become an important component of 3G/4G cellular 
standardization efforts. However, performance analysis of limited-feedback schemes, especially 
under practical impairments such as channel correlation, has not received much attention in 
the literature. The main goal of this work is to study the ensemble properties of a B-hit RVQ 
codebook in the correlated MIMO setting with the metrics of interest being the received SNR loss 
(ASNRrx) and loss in average mutual information (A/), both relative to a perfect CSI scheme. 

We computed the rate of decay of ASNR^x and A7 as a function of B and the channel 
correlation profile. While the rate of decay with B is in conformance with similar results 
obtained in the literature for i.i.d. MIMO/MISO/rank-1 MIMO channels [12], [15], [30]-[37], 
our result applies to correlated MIMO channels of arbitrary rank and arbitrary choice of B. 
For fixed B, the critical factor limiting the RVQ performance is the condition number of the 
channel. We established that the channel correlation profile that minimizes the performance loss 
with an RVQ codebook is typically i.i.d. -like (spatially rich) and the profile that maximizes the 
performance loss has rank-1 (spatially poor/sparse structure). This result on the dependence of 
RVQ performance on the condition number should not be entirely surprising [20], [21], [31] 
given that the RVQ codebook consists of isotropic beamforming vectors and an i.i.d. channel 
has dominant right singular vector that is also isotropic. 

We then generalized our performance analysis to the case of skewed codebooks where the 
RVQ codebook is skewed by a fixed matrix and normalized to ensure unit-norm. From this 
characterization, we showed that the tension between well-conditioning of the ejfective channel 
and well-conditioning of the skewing matrix determines the structure of the optimal skewing 
matrix for limited-feedback beamforming. In particular, we established the criticality of matching 
between the left singular vectors of the skewing matrix and the right singular vectors of the 
channel. Using this insight, we constructed a class of statistics-dependent (more specifically, 
transmit covariance matrix-dependent) skewing matrices that result in significantly improved 
performance over RVQ codebooks. 

The workhorse behind our study is the structure of the density function of weighted-norm of 
isotropically distributed unit-norm vectors. This tool plays an important role in other settings 
such as precoder design for broadcast [47] and interference channels [48], and norm feedback 
in broadcast channels [17]. Notwithstanding the results of this paper, the characterization of the 
performance loss with skewed codebooks is incomplete. Generalizing our toolkit to the density 
function of the ratio of weighted-norms is important in establishing fundamental performance 
limits with skewed codebooks (which are linear by definition) as well as non-linear skewed code- 
books as constructed in [20], moment and distributional properties on the various performance 
metrics, identifying the structure of the optimal skewing matrix, etc. Other problems of interest 
in the single-user setting include averaging the loss expressions over the channel randomness 
to study the impact of the channel model (Kronecker vs. non-Kronecker) on performance. 



establishing possible majorization results for performance metrics as a function of the transmit 
and receive covariance matrix eigenvalues, performance of higher-rank schemes [34], [49], etc. 
Extension of this study to the multi-user setting [15]-[19] is also of practical interest. 



Appendix 

A. Proof of Lemma [7] 

For the first statement, for any O = [Oi, ■ ■ ■ , OatJ, note from ([8]) that 



Pr(|f,(7ri)|2,---,|f,(7rjvJpGe)= / . ^(ftf^ _ l) rff. (108) 

= I^.Area(f, : {|f.(7rfe)p G 0fe}|f/f, = l). (109) 



Since f, is isotropic on G{Nt, 1), (11091) is circularly symmetric and hence, independent of the 
permutation 11 . 

For the second statement, the Ritz-Rayleigh relationship implies that the range of Xj is 
[Aatj, Ai]. The independence of {xj, i = 1, - ■ ■ , 2^} follows from the independence of {fj, i = 
1, ■ ■ ■ , 2^}. To prove that {xj} are also identically distributed, note that if {fj} are isotropic and 
i.i.d., then so are {gj = U^fj} for any fixed unitary matrix U. The fixed unitary matrix in this 
setting is the eigenvector matrix in an eigen-decomposition of H^H for a given realization H, 
wherein we have H^H = UAU^ The diagonal matrix A = diag ([Ai, ■ ■ ■ , AatJ) is in general not 
the identity matrix. For any fixed k, {|gj(A;)p, i = 1, - ■ ■ ,2^} are identically distributed since 
{gj} are i.i.d. The conclusion follows since Xj = |gj(A;)pAfc. ■ 

B. Proof of Lemma |2] 

Following the derivation of the density function of f^Af when A2 = ■ ■ ■ = = in [8], we 
have 

d 

P(x) = Pr (f^Af = x) = — Pr (f^Af < x) (110) 

with 

Pr(ftAf<x) = (111) 
^ - ' Area(l) 

where 

Area (x, y) = Area (f"^Af > x, ||ff = y) and (112) 
Area (?/)^ Area (||ff = y) (113) 



denote the area of a (unit-radius) sphere carved out by the ellipsoid |f : f^^Af = x} and the 
area of a (unit-radius) complex sphere, respectively. The volume of the objects desired in the 
computation of P{x) are 

Vol(x,r2) = Vol (f^Af > X, ||f ||2 < r^) (114) 
— / Area {x,y)dy and (115) 

Jy=0 

Vol(r2) ^ Vol (||f IP < r^) = / Area{x)dx. (116) 

Jx=0 



Thus, we have 



d 

Area(x,l) = ■^yo\{x,r^) 



Area(l) = ^Vol(r2) 



gj.2 



r=l 



(117) 

r=l 

and hence, (118) 



P{x) = 



&^o\(xy 



r=l 



J. Vol (r2) 



(119) 



r=l 



Computing Vol (x, r ) is non-trivial even in the simple case of Nt = 2. This is because every 
additional dimension to the complex ellipsoid corresponds to addition of two real dimensions. 
In the simplest case of Nt = 2, we have the intersection of two four-dimensional real objects 
which cannot be visualized pictorially. Nevertheless, the following lemma captures the complete 
structure of P(x) when Nt = 2. The general case follows subsequently. 

Lemma 3: If Nt = 2, the random variable f^Af is uniformly distributed in the interval [A2, Ai]. 

■ 

Proof: First, note that it follows from [8, Lemma 2] that 



Vol(r2) = -. (120) 

For computing Vol (a;,r^), we follow the same variable transformation as in [8]. We set i{k) = 
Tfe cxp^jOk) for k = 1,2. The ellipsoid is contained completely in the sphere of radius r if r is 
such that r > whereas the sphere is contained completely in the ellipsoid if r < ^J^- In 
the intermediate regime for r, a non-trivial intersection between the two objects is observed and 
one can compute the volume by performing a two-dimensional integration as follows: 

Vol(x,r^) = J J rir2dridr2d9id92 (121) 
= (27r)' • J j ndrir2dr2 (122) 



u' 

JL' 



(27r)'- / r2dr2- I ndn (123) 



where 



A 
B 

L' 
U' 



{ri,r2 : r^Xi + rlX2 > x, + < r"^} and {61,62 
{ri,r2 : rjXi + rlX2 > x, rf + < r"^} 



[0,2n)} 



'X — Tn A 



2 ^2 



Ai 



Ai — X 



Ai — A2 

Straight-forward computation from (I123h establishes the following: 



(124) 
(125) 

(126) 

(127) 
(128) 



Vol (x, r^) = < 





2 ■ Ai(Ai-A2) 



r<Jf- 

Ai 



^ < r < 

Ai — — 



(129) 



2 A1A2 
Using (II 191 ). another trivial computation shows that 

1 



r > 



P{x) 



X2 < X < Ai. 



(130) 



Ai — A2 ' 

That is, f^Af is uniformly distributed in its range. ■ 
Lemma 4: This lemma states (without proof) the structure of the density function P(x) in the 
cases Nt = 3 and A^t = 4. With A^i = 3, we have 



2(x'-A3) 



P(x) 



With Nt = 4, we have 



P(x) 



(Ai-A3)(A2-A3) 

2(Ai-3:) 
(Ai-A2)(Ai-A3) 









(Ai-A4){A2-A4)(A3-A4) 



3{Xi-xy 



(Ai-A2)(Ai-A3)(Ai-A4) 





X < A3 
A3 < a; < A2 
A2 < a; < Ai 
X > Ai. 



X < A4 
A4 < X < A3 

A3 < a; < A2 
A2 < a; < Ai 
X > Ai 



(131) 



(132) 



where 



K2 



(Al - A3) (A2 - A4) 



(x - A3) (A2 - aO _^ (x - A4) (Al - x) 



A2 — A3 



Al — A4 



(133) 
(134) 



C. Proof of Theorem [7] 

As stated at the beginning of Sec. Hill we compute Ai using Lemma [21 The computation of 
Ai in the Nt = 2 case is a straight-forward integration. 

For the A^t = 3 case, we split the integral computation into two parts: the intervals [A3, A2] and 
[A2, Ai]. The integral over the first interval is again straight-forward and results in the contribution 
of 



1 



2m + 1 



A2 — A3 



A2 — A3 

Ai — A3 



(135) 



Upon elementary manipulation, the integral over the second interval can be shown to be equiv- 
alent to 



v/(Ai-A2)(Ai-A3) 
Ai 



(136) 



which can be computed in closed-form using integral tables [50, 2.512(3), p. 131] via the 
transformation y 1— i- sin(6'). Combining the two terms, we have the expression for Ai in the 
statement of the theorem. 

For the Nt > A case, exact computation of Ai is cumbersome. Since the distribution function 
F{x) is monotonically increasing, the dominant trend (and term) of Ai is captured by the integral 
over the segment [A2, Ai] alone. This integral can be computed in closed-form due to the tractable 
nature of F{x) in this interval. Upon elementary transformations, this integral is seen to be: 



cos2"+^(^)sinP(0) d9 



(137) 



with p 



Nt-l 



{Nt - 1) 



' Nt 

n 

.i=2 



1 - 



Ai 



Or, 



sm 



Nt 

n 



Ai — A2 
A^ 



A, 



(138) 



(139) 



Again, using the integral tables [50, 2.511(4), p. 131], we can compute (|137l) in closed-form as 
in the statement of the theorem. 



It is obvious that Ai appx < Ai. For the other side of (|301) . note that 



Amp. _ g.(P^(f'Af<^))"rf^ 
Ai f^^ (Pr(ftAf <i-))™(;i 

^ j,r;(Pr(ftAf<x))"dx ^^^^^ 

appx 

< (Pr(ftAf<A,))"(A,-A^J ^^^2^ 

-^1 ■ Ai^ appx 
^2 — ^Nt D"^ 



X A ^^^^^ 

-^1 ^l,appx 

where the third step follows by bounding the distribution by its largest value at x = A2 and the 
last step by noting from (|25l) that 

Pr (f^Af < A2) = D. (144) 



D. Proof of Prop. \T} 

In the general MIMO setting with Nt = 3, we have 

2'^+^m(m — 1) ■ ■ ■ {m — k) 



h{m) 



(2m- l)(2m-3)---(2m-2A;- 1) h{m 
where h{-) is a function defined on the set of integers as 



h{m) 



m! 



■ 2 



2m 



2m! 



(145) 



(146) 



Using Stirling's formula [51, 6.1.39, p. 257] to approximate the factorial function as m = 2^ 
increases, we obtain a good estimate of the trend of h{m), and hence the summation in the 
characterization of Ai in Theorem [T] Retaining the dominant terms, we can write Ai as 



Ai 



In the Nt> A case, we have 

2'' ■ m(m — 1) ■ ■ ■ 



fm — A; + 11 



At — Aa 



2(Ai - A3) 
m! ■ r + m — k 



(147) 



1 ij=m—k 



r 



1 



Nt-l 



+ m 



m 



k\ 



,(p + l + 2j) 

where r(-) stands for the Gamma function. With k = m, the above equation simplifies to 



(148) 



where the asymptotic trend follows from Stirling's formula. For l<A;<m — 1, we have 
m! ■ r I , + ^ — A; ) „ , T [m — k + -, 



r ^]y^ + fn^ ■ m — k\ 



m 



T(m-k + l) 



Using the trivial inequality r(m-fc+i) - k = T j , we have 



A 



Nt 



K-2 ^t- 



1 + 
1 + ^ 



D 



{i-Dm-i)j\ 

D 



l-D){Nt-l] 



(151) 



(152) 
(153) 



E. Proof of Theorem |2] 

First, note from (fT5l) that (l47l) is equivalent to showing that 



Ec[maxif/HlHi^ ^ Ec [max, f;THTH2f.] 



Ai 



Using the eigen-decompositions 

HjHi = Uidiag(A)Ul, H\H2 = U2diag(/x)U| 



in (|154l) . we have 



Ec[max,f/Uidiag(A)Ulf,] ^ Ai 



(154) 



(155) 



(156) 



Ec[maXif/U2diag(/x)U^f,] /ii ' 

From Lemma [TJ we note that {Ujfj} and {Uafj} are i.i.d. and have the same distribution as 
{fj}. Thus, (11561 ) is equivalent to showing that 

Ec[maXif/diag(A)fi] ^ Ec [maxj f/ diag(/x) f^] 
Ai ~ /ii ■ 

In other words, the proof is complete if we can show that 

Ec[max,^;, %{k)\^\k] 



(157) 



/(A) 



Ai 



(158) 



is a Schur-concave function of A. 

It is important to note that /(A) is a ratio of two Schur-convex functions. For this, it is obvious 
that A -< /x implies Ai < fii. On the other hand, the numerator of /(A) can be shown to be 
Schur-convex since max(-) is a convex function of its argument. Without a standard recipe for 
studying the Schur-concavity of a ratio of Schur-convex functions, we resort to basic theory [42, 
A.2.b, p. 55] from which we can claim that /(■) is Schur-concave if and only if: 



/(■) is symmetric in its indices. That is, /(A) = /(AIT) for all permutations 11 = [tti, • ■ ■ , ttatJ. 
/([Ai, s— Ai, A3, ■ ■ ■ , AtvJ) is decreasing in Ai for all Ai > s/2 and any choice of s, A3, ■ 



The first condition is straight-forward since 



/(A) 



maxfc Afc 

(a) Ec[max^^^, |f^(7rfc)pA^J 

(b) Ec[maXiJ2k\W?Kk] _ 



/(An) 



■■■ ,A 

(159) 
(160) 
(161) 



where (a) follows from the symmetricity of the sum function and (b) from the exchangeability 
of |fi(fc)p proved in Lemma [TJ For the second condition, it can be seen that 



/(^[-^ij s — Ai, A3, ■ ■ ■ , AatJ 

K = \m)\' - + 



Er 



max Ej 



2 , J2k>2\W?^k 



Ai 



(162) 
(163) 



where 62 = s, Sk = \k, k > 3. For every realization of {fj} from the RVQ codebook and every 
choice of s, A3, ■ ■ ■ , Aat^, all the functions Ei, i = 1, ■ ■ ■ , 2^ are decreasing in Ai. Thus, the 
max(-) function is also decreasing in Ai. Averaging over the RVQ codebook, we arrive at the 
second condition. ■ 



E Proof of Theorem \3\ 

In the Nt = 2 case, 5 = A2 ■ logg(2) is written as 



where 



6 



Ai 



(x — A2)™' dx 



(Ai - A2)- Jx, 1 + px 



P 



Ai— A2 m 



(Ai - A2)™ io 1 + pA2 + px 



P 



m— 1 



E 

s 



X + s 



dx 



Ai — A2 / m — t \ Ai — A2 



loge 



1 + pA] 
l + pAs 



Ai-A, 



log,(l + 2;)-^ 



t=i 



t 



1 + pAa 
P 



Ai - A2 _ p(Ai - A2) 
s 1 + pA2 



(164) 
(165) 
(166) 

(167) 

(168) 

(169) 



In the general Nt case, the dominant term of 5 = A2 ■ logg(2) is written as 



1 + pAi - py 



'-^ pA {1 - y^^~T dy 
1 + pAi - pAy 



(170) 
(171) 



where A = ^nj>2 -^i ^ -^ij * • There are two ways in which (11711) can be computed: 1) 
replacing the denominator of the integrand by an appropriate geometric series, and 2) expanding 
the numerator of the integrand using the binomial theorem. 

Method 1: With 7 = and using the fact that < 1 for all y in (|171l) . we replace the 



denominator in (|171l) with a geometric series to result in 

. „ ^1-^2 00 



Upon elementary integrand transformations, (|172l) is written as 



5 



2pA 



(iV,-l)(l + pAi) f 



i=0 



(172) 



(173) 



where 6'max is as in (11391 ) and pi 
2.511(4), p. 131], we have 



2(t+l) 
Nt-1 



1. Computing this integral in closed-form using [50, 



pA 



(iV,-l)(l + pAO ^ m + ^ 



2^^ ■ m(m — 1) ■ ■ ■ {m — k + 1] 



-D 



(174) 



(2m + Pi - 1) ■ ■ ■ (2m + pi - 2fc + 1) ' 
Method 2: Alternately, expanding the numerator in (11711) using the binomial theorem, we have 



pA 



1 + pAi Jo 
pA 



1 + pAi 



E 

k=0 



(-1) 



1 -7y 

/ Ai-A2 N(^t-l)fe+l 



(175) 



(iV, - 1)A; + 1 

2F1 f 1, (iV, - 1)A; + 1; {N, - 1)A; + 2, ^1^1^ 
V 1 + pAi 

where the second equation follows from [50, 3.194(5), p. 285], and 

(a)„(6)„ 2" 



2Fi{a,b; c,z) = ^ 



n=0 



is the Gauss hypergeometric function with (a)„ denoting the Pochhammer symbol: 

(a)„ = a ■ (a + 1) (a + n — 1), n > 1, (a)o = 1. 



(176) 



(177) 



(178) 



Using the definition of the hypergeometric function [50, 9.100, p. 1039], we have 



A "- / \ / \ \ \ {Nt-l)k+l 



X 



^ {Nt-l)k + l + t \ 1 + pX^ I ^ ^ 



1=0 



The second expansion suffers from numerical instabilities due to the oscillatory nature (changing 
signs) of terms in the expansion. 

Correction Term: The expression for the correction term in (TTOb -dTTI) and its trend in (1721) 
follows on exactly the same lines as the proof of Theorem \T\ Thus the details are not provided 
here. ■ 



G. Proof of Prop. \3\ 

In the Nt = 2 case, as B increases, two possibilities arise depending on the relationship 
between p, Ai and A2. In the first case, if 

2 < 1 p(Ai - 2A2) < 1, (180) 

using a Taylor's series approximation for loge(l + z), we have 

, f^y.^^^ as.) 

\Ai — A2/ m + 1 m + 1 

On the other hand, if 

z > 1 ^ p(Ai - 2A2) > 1, (182) 

using the fact that 

log,(l + z)= \og,{z) + log, (1 + ^) , (183) 

we have 

t=0 ^ ' 

m -j^ 

^logJ^) + i / A- 1 ^ 

- z"^ V z) z^^{m-2t-l) 

where the second equation follows from the following reasoning: 

11 m 

< , t = 0,---, 1. (186) 



m-2t m-2t-l 2 
We approximate the sum in (11851 ) as i? —t- 00 by the following integral: 

\0g^{z) + i ^ f Z - 1 

~2r I jr. ^ - t 



5 - 1 ^ + ( ^ 1 / l^iz^dt (187) 



with a = 21ogg(2;) > 0. Estimating the above integral from [50, 3.252(5-6), p. 311], we have 

'z-1 



+ 



2z 

z-l 
2 ■ 



■ e 



-(m.-l)\og^(z) 



Ei((m-l)log,(2;)) -Ei(-log,(z)) 
Ei(log,(^)) + li(^™-i) 



(188) 
(189) 



where Ei{x) = 



^ and E\{x) = — Ei(— x) denote the exponential integral functions, and 

dt 



\\(x) 



denotes the logarithmic integral function, respectively. From [51, p. 231], we have 



\\(x 



x— >oo X 



6 



_B— >-oo 



:z-i] 



(190) 



(191) 



logg(x) ■ ' 2 ■ z\og^{z) ■ {m - 1)' 

In the general Nt case, it is easier to capture the asymptotic trends of A2 using the expression 
obtained from Method 1. For this, we first write 

2^ ■ m{m - 1) ■ ■ ■ (m - A; + 1) T{m -k + ^) 



r(m + 



i+l 



■ m — k\ 



n;=i(2m-2j+p, + l) . 
Ignoring the term corresponding to D'^ in the inner sum in (11741) . 5 can be rewritten as 



(192) 



6 



^ 00 
1=1 



-fj,i 



, rn!r(fc + ^) 
,.=0 k\T{m + l + ^] 



1 



(193) 



where /i = log [j^^^t^]^ > 0- ^PH*^ '^l^^ outer sum into two parts: I < i < Nt — l and i > Nt, 
and the inner sum into two parts: k = and k > 1 and denote the corresponding contributions 
to 5 by 5j, z = 1, ■ ■ ■ , 4 respectively. 
With respect to 61, we have 



Nt-l 



61 



1=1 
Nt-^ 



-fii 



ml 



r(m + i + ^) 



e ^ m 



(194) 
(195) 



where the second line follows from the fact that r(x) is monotonically decreasing in < x < 
1 [51] and using the Stirling's formula for r(-). For 82, we have 

5. 4 - V e-^' ■ — (197) 



* " i=Nt " ' ^ ' Nt- 

^ 00 



i=Nt 

1 

< 



(198) 



y e- . ^ . ^ (199) 

i=Nt 

1 1 g-M{A^t-l) 

< (200) 

where the second line follows from the definition of the Beta function, and the third line follows 
from the fact [52] that /3(x,y) < — if x > 1 and y >l. We now use the fact [53] that 

xy 

> r^^-^ < . < 1 (201) 

to bound ^3 as follows: 

A'.-ifer(m + i + ^)^ r(A + l) 

1 Nt~l -p/ m-l 



D 1 

* i=i 



L» 1 l^e^^^ 



(205) 



1 — D Nt — 1 e^TTjiVt-i _ 1 

where the third line follows from Stirling's formula for r(-) and the fact that k^t-^ is a 
decreasing function of k. 
For ^4, we have 



m— 1 



= ^ E E nw'^^'t'^ . (206) 

^ 1 y> e-^- ■ ml ■ e'-y {k + 

- A^*-l,t^, r(m + y + l) ^ (A; + l)'^+^ 

where we use y in (12061) to denote y = > 1 and the second line follows from [54], where 
if 6 > a > 1, we have 



1 



T(b) , 
T(a) a"- 2 



Using the fact that ''^^^^''^fc+i^ monotonically increasing in k for any ?/ > 1, we have 



< 



C3 ■ > T ■ ^ r- (210) 

T{m + y + l) (m+m+i 



^ e-^* ■ (m + 1)"^+^ ■ e (m + 

C3 ■ / i — ■ i~ (211) 

jt^t im + y + l)"'+y+2 (m + l)'"+2 



00 



-II? / I \ 'm-+y~i 

■ p I m + y \ 2 



y " ■ '"^^ (212) 

m + y + 1 ym + y+l ' 



°° — //i / I 1 

"^*-e f m + y 



C3 ■ y " ■ exp - "^"""^ (213) 

m + y + I \ m + y + 1 I 



< 



i=Nt 

C3 ^ C3 e-^(^*-i) 



- . y e-^' = — ^ • (214) 

1 ^ m + 1 1 - e-^ 



m + 1 m + 1 1 — 

where C3 = (xrTjjpvT^' '•^^ third line follows by using Stirling's formula for T{m + 1) (as a 
function of m + 1) and T{m + ?/ + 1) (as a function of m + y) and the fifth line follows from 
the fact that 

(1+x)- X e. (215) 

Putting together the trends of 5j, z = 1, ■ ■ ■ , 4, we obtain the conclusion in the statement of the 
proposition. 

With respect to Method 2, we approximate the inner sum in (11791) by an appropriate refor- 
mulation of the exponential integral, and as i? —t- 00, we have 



X 



^{(N,-i)k+i), . _ ^ (216) 



where /i = log (^-^^^^^r^^ > 0. The oscillatory nature (changing signs) of the terms in (12171) 
and the intractable nature of the exponential integral (for general values of the argument) imply 
that it is much harder to obtain insights on the asymptotic trends of A2 with (|217l) than with 
the expression from Method 1. ■ 



H. Proof of Prop.^ 

We can rewrite the distribution function relevant in computing Ai ^k as follows: 

/ftAt|-|tHAf \ / \ 

Pr { ^^^^^^ < x\iH = 1 j = 1 - Pr(^f^A^ (H^H - xl) Af > 0|f^f = ij (218) 

= 1 -Pr(^f^B^f > 0|f^f = l) (219) 



where is defined as 



= A^H^HA-xA^A. (220) 

Remark 1: Note that B^. is Hermitian, but not positive semi-definite. In fact, B^ has the same 
number of positive, negative, and zero eigenvalues as (H^H — xl) AA^^, which is the same as 
those of H^H — xl (see [55, Theorem 7.6.3, p. 465] for details). Using an eigen-decomposition 
for B^. of the form B^^^ = Va-f^V^ in the special case of A^^ = 2 where f^. = diag([ri 2:, ^2,x\) 
such that Ti^x > ^2,x, we have: 

1) Ti,, >0 = r2,. if x = A2(HtH), 

2) Fi,,. > > r2,x. if X G (A2(HtH), Ai(HtH)), 

3) Fi,, = 0>r2,. ifx = Ai(HtH). 

■ 

Thus, we can rewrite (12191) as 

/ftAtHtHAf \ / \ 

Pr (^tAtA^ - "^1^^^ = ij = 1 - Pr(|f(l)pFM + m\'^2,. > 0|ftf = l) (221) 

= 1 - Pr(|f(l)p (Fi,, - F2,.) > -F2,x|f^f = l) (222) 

= 1 - Pr ('|f(l)|2 > Iftf = l\ (223) 

where the second equation follows from noting that Ti ^ > and F2,x < for all x G 
[A2(H^H), Ai(Hl^H)]. We now use [8, Lemmas 2 and 4] to compute the above term (see Ap- 
pendix |B] for details) as 

Pr (^|f(l)|2 > ftf = 1^ = ^ElA (224) 

V Wl,x\ + |F2,x| / |Fl,a;| + |F2,x| 

Thus, Ai can be expressed as 

Al,sk= VTTTTTTT- / ,^ ''1 , dx. (225) 



Al(HtH) AaCHtH) 

ir I 

Now observe that Ai sk is monotonically increasing as a function of jr^- Thus, an upper 
bound on also results in a corresponding upper bound on Ai sk- For this, note that 

F2,x = A2(ATH^HA-xA^A) (226) 
> A2(A^H^HA) -a;Ai(Al'A) (227) 



where the second step follows from a routine application of Weyl's inequality [55]. Since the 
right-hand side of (12271) is non-positive for all x, we thus have 

\^2,x\ < xXii/K^A) - A2(AtH^HA). (228) 

For Fi -^., we use [56, Corollary 11] to see that 

Ti,. >(Ai(HtH)-x)-Ai(AAt). (229) 

Note that the bounds in (12281) and (12291) are non-trivial (that is, the bounding terms are non- 
negative). Combining them, we have 

|r2,x| ^ xAi(AtA) - A2(AtHtHA) 



|ri,.| - (Ai(HtH)-a;)-Ai(AAt)- ™ 
Using the bound in (12301) . after a routine integral computation, it is straightforward to see that 



Ai,sk < 



1 



m + 1 



1 



A2(AtHtHA) 
Ai(HtH) • Ai(AAt) 



'1 - (CD 



A2(HtH) 
Ai(HtH) 



(231) 



Ai,sk 



where m = 2^ and 



A2(HtH) ■ Ai(AAt) - A2(AtHtHA) 



^ Ai(HtH) ■ Ai(AAt) - A2(AtHtHA) ' 
Since C4 < 1 and B ^ 00, the conclusion in (|8TI) is immediate. 



(232) 



/. Proof of Theorem |4] 

For any integer k > 1, define the following expectation over the ensemble of RVQ codebooks 



Gk 



ZftAtHtHAf 
V ftAtAf 



It is easy to check that 



lim Gk = Ai(HtH) 



Gi > A^,(HtH) 



(233) 

(234) 
(235) 



and it follows from Lyapunov's inequality [57, Prob. 28, p. 143] that Gk is non-decreasing with 
k. Since 



A..(HtH)<M^^<A,(HtH) 



(236) 



it can be concluded that there exists some Ki> 1 and some K\j satisfying Ki < K\j < 00 such 
that 

Ai(AtHtHA) 



G 



Kl-I 

B 



< 



Ai(AtA) 



<Gk, 



Ai(H^H) -2-- < 



G 



Ku 



< Ai(Hl'H). 



(237) 
(238) 



Thus, Ai sk can be bounded as 
Ai,sk-Ai(HtH) < 

^A]v,(HtH 



Aj^(AtHtHA) 
Ai(AtA) 



]V,(HtH) 



ZftAtHtHAf , . 
V ftAtAf - ' 



dx 



k=Ki_ 



Pr 



ftA^HtHAf 



+ 



Ai,sk. 



Ai(HtH) 



Gf. 



Pr 



ftAtAf 

ZftAtHtHAf 
V ftAtAf 



< xlftf = 1 



dx 



< xlf^f = 1 



dx 



For the first term of Ai,sk in (12391) (denoted as 71), since ftAtAf < Ai(AtA), we have 



Pr 



ftAtHtHAf 
ftAtAf 



< X 



ftf = 1 < Pr 



/ftAtHtHAf 



< X 



ftf = 1 



V Ai(AtA) 

(Ai(AtHtHA) -a;Ai(AtA))'^*"' 
n£2^i(A^HtHA) - Aj(AtHtHA) 



(239) 
(240) 

(241) 
(242) 



where the second step follows from an application of Lemma[2]to the distribution of ftAtHtHAf. 
Using a computation that mirrors that in Theorem [H we have 



B^oo K ■ 2 ^t-i 



Nt-1 
with K = r(-|y3Y) 



Ai(AtHtHA) 
Ai(AtA) 



-A^JHtH- 



D. 



sk 



(243) 



^^ = ^-f[ ^'^^^^^^^^ ~ ^^*(H^H) ■ Ai(AtA) 

i=2 



(244) 



Ai(AtHtHA) - Aj(AtHtHA) 

The tightness of (|243l) follows from the tightness result established in Theorem [TJ 

For bounding the second term of (12391 ) (denoted as %), we need a reverse Cauchy-Schwarz 

inequality, which is presented next. 

Lemma 5: Let X be a positive random variable. Let g(-) : R"*" i— M"*" be a monotonically 

increasing function such that g(X.) and ((^(X))^ are integrable. If x is such that g{x) < E [^'(X)], 

we have 

B[g{X)]-g{x)"' 



Pr X > a; > 



E 



(245) 



Proof: Since x is such that E [^'(X)] > g{x), using the standard Cauchy-Schwarz inequality 
and the monotonicity of g{-), we have 

E[(?(X)] - g{x) < E[(7(X)] - E[(7(X)1(X < x)] (246) 
= E[(7(X)-(7(X)l(X<x)] (247) 
= E[(7(X)]l(X>a:)] (248) 



< v/E[((7(X))2].Pr(X>x). (249) 

Rearranging (12491 ). we have the conclusion of the lemma. ■ 
For each k satisfying Ki < k < K\j, we repeatedly apply Lemma [5] with 

^ - ftAtAf ^^^^^ 



and g{x) = x^ to get the following bound for 

r2<E 

k=K^ 



dx (251) 



v^^r^i^i!!::^ (252) 

^ /o ((G,)^-y(G2fc)'=)^ 



fc=_ft:L 



where Ik = {elk)'' ' '''^^ second equation follows from a transformation y i— )■ ^ (g^^)fc > 
and the third step follows by trivially bounding y < h- Note that the monotonicity of Gk with 
k implies that Jfc < 1. With the transformation y i— )■ sin(^^), we can reuse the computation in 
Theorem [T] to estimate T2. However, this estimate is not sufficient for our purpose and hence, 
we will establish a tighter estimate now. 

From [50, 2.512(3), p. 131] and Stirling's formula for h(m) in (11461) . we have 

m— 1 



y^_(G2^ L'fiO^zlH] ,254) 



00 g-afcX 



7^.(2,n+l)xE T:^'^":.M l+ ^<*:^ (255) 



Since /i(j) x -y/vrj for j large, we can estimate (12541) by 



^ (G2fc)^-0F^-4 A ^ r(l/2,afc) ^ 

{G2k)'-V^-Ik A ^ l-erf(V^) ^ 
k.{Gk.,r-^ ['^ ^k ) ^'''^ 



where ak = log^ ( . and 



oo 



T{a,x) = / f-'e-'dt (258) 



is the incomplete Gamma function. The second step follows from [51, 6.5.3, p. 260], and 

erf(x) = 4= / e"*'^^ (259) 



is the error function. The third step follows from [51, 6.5.17, p. 262]. Note that as B increases, 
K\j increases and Ik — > 0. As a result, we have 



«fc = log,(l + ^^) X 4^ (260) 



In this setting, from [51, 7.1.6, p. 297], we thus have 



for some constant C5. Using the relationship in (I237I) - (I238I) . we can write (12611) as 

^^^-r ( (G.kJ''^ , ^^^ Ai(HtH) f X,{WH)X,{AA^) \'\ 

^ i i^L ■ + ^ MAtHtHA) ) ) ^'^'^ 



<2-f -Cs-AilHtH). I — 



1_ ( Ai(HtH) Y"'' 1 / Ai(HtH)Ai(AAt) y \ 

fLVA^.(HtH)y' + L.^k+l\ Ai(AtHtHA) ) j 



\(AtHtHA) J ''''' 



(263) 

^/Al(HtH)■Al(AAt)^^ 
A 

where we have used the symbolic notation G(-) to denote the monotonically increasing function 
in (12631 ) for a given H. The tightness of (12641) is due to the tight estimation of the integral 
in (12531) . 

For the third term of (12391 ) (denoted as Ts), we trivially over-bound Pr (^ ^^f/^tAf^^ — = 
by 1 and use the definition of K\j to obtain 

r3<Ai(HtH)-G;,, <2-f. (265) 

Combining the three terms 7i, T2 and Ta, we have 

Ai,sk-Ai(HtH) < Ai,3k-Ai(HtH) (266) 

B , ^^ Ai(HtH)Ai(AAt) ^^^ , 
^ ^ l^^ + ^l A,(AtHtHA) J j+^V^^ 

Ai(A^HtHA) , .utmV^^ ^ 



A.(AtA) ^--(^^^)J-l^^ (l-/^.)C/V.-l) J- ^^^^^ 
If Nt > 4, it is clear that the first term in (12671 ) is sub-dominant relative to the second term. The 
statement of the theorem hence follows. ■ 
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